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1 Introduction 

In proton-proton collisions at the LHC, interactions take place between the partons of the collid¬ 
ing protons. The scattered partons from hard collisions fragment and hadronize into collimated 
groups of particles called jets. The study of jets with high transverse momentum (pj) provides 
a test of the predictions from quantum chromodynamics (QCD) and deviations from these pre¬ 
dictions can be used to look for physics beyond the standard model. While parton scattering is 
an elementary QCD process that can be calculated from first principles, predictions of jet distri¬ 
butions require an accurate hadronization model. In this paper, several hadronization models 
are examined. 

High-pj parton production is described by perturbative QCD (pQCD) in terms of the scat¬ 
tering cross section convolved with a parton distribution function (PDF) for each parton that 
parametrizes the momentum distribution of partons within the proton. The hard-scattering 
cross section itself can be written as an expansion in the strong coupling constant a s . The lead¬ 
ing term in this expansion corresponds to the emission of two partons. The next term includes 
diagrams where an additional parton is present in the final state as a result of hard-gluon ra¬ 
diation (e.g. gg —> ggg). Cross sections for such processes diverge when any of the three par- 
tons becomes soft or when two of the partons become collinear. Finally, pQCD predicts three 
classes of four-jet events that correspond to the processes qq/gg —> qqgg, qq/ gg —> qqqq and 
qg —> qggg/ qqqg, where q stands for both quarks and anti-quarks. Processes with two or more 
gluons in the final state receive a contribution from the triple-gluon vertex, a consequence of 
the non-Abelian structure of QCD. 

We are studying distributions of topological variables, which are sensitive to QCD color fac¬ 
tors, the spin structure of gluons, and hadronization models. These topological variables were 
studied widely in the earlier LEP IHIZ1 and the Tevatron ||3][4[ experiments and help to validate 
theoretical models implemented in various Monte Carlo (MC) event generators. 

The distributions of multijet variables are sensitive to the treatment of the higher-order pro¬ 
cesses and approximations involved. Many MC event generators make use of leading order 
(LO) matrix elements (ME) in the primary 2—>2 process. A good agreement between the mea¬ 
surements and MC predictions can establish the validity of the treatment of higher-order ef¬ 
fects, and any large deviation may lead to large systematic uncertainties in searches for new 
physics. 

The multijet observables presented here are based on hadronic events from 7 TeV pp collision 
data recorded with the CMS detector corresponding to an integrated luminosity of 5.1 ftD 1 . The 
kinematic and angular properties of these events are computed from the jet momentum four- 
vectors. Unfolding techniques are used to correct for the effects of the detector resolution and 
efficiency. Systematic uncertainties resulting from the limited knowledge of the jet energy scale 
(JES), jet energy and angular resolution (JER), unfolding, and event selection are estimated, and 
the unfolded distributions are compared with predictions of several QCD-based MC models. 

In this paper, the CMS detector is briefly described in Section[2] Sections [3] and [4] summarize the 
MC models used and the variables studied in this paper. Event selection and measurements 
are described in Sections [5] and [6} respectively. The correction of the distributions due to de¬ 
tector effects is discussed in Section [7] Sections [8] and [9] describe the estimation of systematic 
uncertainties and the final results. The overall summary is given in Section [TO] 
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3 Monte Carlo models 


2 The CMS detector 

The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal di¬ 
ameter, providing a magnetic field of 3.8 T. Within the field volume are a silicon pixel and 
strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and 
scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. 
Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke out¬ 
side the solenoid. Extensive forward calorimetry complements the coverage provided by the 
barrel and endcap detectors. The barrel and endcap calorimeters cover a pseudorapidity re¬ 
gion —3.0 < rj < 3.0. Pseudorapidity is defined as // = — In tan [0/2], where 6 is the polar 
angle. The transition between barrel and endcaps happens at \rj\ = 1.479 for the ECAL and 
|;/| = 1.15 for the HCAL. The first level (LI) of the CMS trigger system, composed of custom 
hardware processors, uses information from the calorimeters and muon detectors to select the 
most interesting events in a fixed time interval of less than 4 fis. The high-level trigger (HLT) 
processor farm further decreases the event rate from around 100 kHz to around 400 Hz before 
data storage. A more detailed description of the CMS detector, together with a definition of the 
coordinate system used and the relevant kinematic variables, can be found in Ref. [5|. 


3 Monte Carlo models 

The MC event generators rely on models using modified LO QCD calculations. The elementary 
hard process between the partons is computed at LO. The parton shower (PS), used to simulate 
higher-order processes, follows an ordering principle motivated by QCD. Nevertheless, the 
parton shower models can differ in the ordering of emissions and the event generators can also 
have different treatments of beam remnants and multiple interactions. 

The pythia 6.4.26 J6J event generator uses a PS model to simulate higher-order processes [7-9] 
after the LO ME from pQCD calculations. The PS model, ordered by the pj of the emissions, 
provides a good description of event shapes when the emitted partons are close in phase space. 
Events are generated with the Z2 tune ItlOH for the underlying event. This tune is identical to 
the Z1 tune fill , except that it uses CTEQ6L1 Ifl2l PDFs. The partons are hadronized (process 
of converting the partons into measured particles) using the Lund string model [13;, 14], 

The pythia 8.153 Ifl5l event generator also uses a PS model with the successive emissions of 
partons ordered in pj and the Lund string model for hadronization. The main difference be¬ 
tween the two PYTHIA versions is the description of multiparton interactions (MPI). In PYTHIA8, 
initial state radiation (ISR), final state radiation (FSR), and MPI are interleaved in the pj order¬ 
ing, while in PYTHIA6, only ISR and FSR are interleaved. The TUNE4C lH6l is used with this 
generator. This tune uses CTEQ6L1 PDFs with parameters using CDF as well as early LHC 
measurements. 

The HERWIG++ 2.4.2 lH7f TUNE23 fl8l program takes the LO ME and simulates a PS using 
the coherent branching algorithm with angular ordering [fT9l of the showers. The partons are 
hadronized in this model using a cluster model ||20| and the underlying event is simulated 
using the eikonal multiple partonic scattering model. 

In the case of MadGraph 5.1.5.7 H2l"l , multiparton final states are also computed at tree level. 
The parton shower and nonperturbative parts for MADGRAPH 5.1.5.7 simulation sample is 
handled by PYTHIA 6.4.26 with Z2 tune. The MLM matching procedure Ii22l is used to avoid 
double counting between the ME and the PS. The MadGraph samples are created in four bins 
of the variable Hj, the scalar sum of the parton pjs. The matching between ME and PS has 
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been studied in detail and has been validated using inclusive jet pj distributions. Several sam¬ 
ples are generated using different matching parameters and are used in estimating systematic 
uncertainty in the theoretical prediction. 

These MC programs are the most commonly used models to describe multi-partonic final states 
and are normally used to describe QCD background in searches within CMS. The events pro¬ 
duced from these models are simulated using a CMS detector simulation program based on 
GEANT4 Il23l and reconstructed with the same program used for the data. These MC events 
are used for the comparison with the measurements as well as to correct the distributions for 
detector effects. 

4 Definition of variables 

4.1 Three-jet variables 

The topological variables used in this study are defined in the parton or jet centre-of-mass 
(CM) system. The topological properties of the three-parton final state in the CM system can be 
described in terms of five variables 0. Three of the variables reflect partition of the CM energy 
among the three final-state partons. There are three angles, which define the spatial orientation 
in the plane containing the three partons, but only two are independent. 



Figure 1: Illustration of the three-jet variables in the process 1+2—>3+4+5. The scaled energies 
are related to the angles (a*) among the jets for massless parton. 

It is convenient to introduce the notation 1+2—>3+4+5 for the three-parton process. Here, num¬ 
bers 1 and 2 refer to incoming partons while the numbers 3,4, and 5 label the outgoing partons 
in a descending order in energies in the parton CM frame, i.e. E 3 > E 4 > E 5 (Fig. [Tj). The final- 
state parton energy is an obvious choice for the topological variable for the three-parton final 
state. For simplicity, E, (i = 3,4, 5) is often replaced by the scaled variable x t (i = 3,4, 5), which 
is defined by Xj = 2EJ \Js 3 45 , where 1/^345 is the CM energy of the hard-scattering process. It 
is also referred to as the mass of the three-parton system, and by definition, 

x 3 + x 4 + x 5 = 2. ( 1 ) 

The internal structure of the three-parton final state is determined by any two scaled parton 
energies. The third one is calculated using Equation [l] It needs two angular variables which 
fix the event orientation. In total, five independent kinematic variables are needed to describe 
the topological properties of the three-parton final state. In this analysis, however, the study is 
restricted to three variables: \/S 345 , x 3 , and x 4 , while the angular variables are not included. 
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4 Definition of variables 


4.2 Four-jet variables 

To define a four-parton final state in its CM frame, eight independent parameters are needed. 
Two of these define the overall event orientation, while the other six fix the internal structure of 
the four-parton system. In contrast to the three-parton final state, there is no simple relationship 
between the scaled parton energies and the opening angles between partons. Consequently, the 
choice of topological variables is less obvious in this case. Variables are defined here in a way 
similar to those investigated for the three-parton final state. The four partons are ordered in 
descending energy in the parton CM frame and labeled from 3 to 6. The variables include the 
scaled energies and the polar angles of the four partons with respect to the beams. 

In addition to the four-parton CM energy or the mass of the four-parton system (V^3456), two 
angular distributions characterizing the orientation of event planes are investigated. One of 
these is the Bengtsson-Zerwas angle (xbz) 641 defined as the angle between the plane contain¬ 
ing the two leading jets and the plane containing the two nonleading jets: 


(P3 x Pa) ■ (P5 x p 6 ) 

COS^BZ = -T^- J nS - -*T~ 

p3 X p 4 p 5 X p 6 \ 


( 2 ) 


The second variable is the cosine of the Nachtmann-Reiter angle (cos 0 \r) H25l defined as the 
angle between the momentum vector differences of the two leading jets and the two nonleading 
jets: 


cos 0NR = 


(P3 ~ Pa) • (p5 - Pe) 

IPs-/blips-Pel 


(3) 


Figure [2] illustrates the definitions of ^bz and $nr variables. Historically, Xbz and 0 nr were 
proposed for e + e _ collisions to study gluon self-coupling. Their interpretation in pp collisions 
is more complicated, but the variables can be used as a tool for studying the internal structure 
of the four-jet events. 




Figure 2: Illustration of the Bengtsson-Zerwas angle (^bz) and the Nachtmann-Reiter angle 
($nr) definitions for the four-jet events. The left figure shows the Bengtsson-Zerwas angle, 
which is the angle between the plane containing the two leading jets and the plane containing 
the two nonleading jets. The right figure shows the Nachtmann-Reiter angle, which is the angle 
between the momentum vector differences of the two leading jets and the two nonleading jets. 
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5 Data samples and event selection 

Jets are reconstructed from particle-flow (PF) objects Il26l [27j using the anti-/c T clustering al¬ 
gorithm fl28l with the distance parameter R = 0.5, as calculated with Fastjet 2.0 If29l . The 
PF algorithm utilizes the best energy measurements of each particle candidate from the most 
suitable combination of the detector components. A cluster is formed from all the particle- 
flow candidates that satisfy the chosen distance parameter. The four-momentum of the jet is 
then defined as the sum of four-momenta of the corresponding particle-flow candidates, which 
results in jets with nonzero mass. 

The JES correction applied to jets used in this analysis is based on high-px jet events gener¬ 
ated by PYTHIA6 and then simulated using GEANT4, and in situ measurements with dijet and 
photon+jet events H30l . An average of ten minimum bias interactions occur in each pp bunch 
crossing (pileup), and this requires an additional correction to remove the extra energy de¬ 
posited by these pileup events. The size of the correction depends on the pj and ;/ of the jet. 
The correction appears as a multiplicative factor to the jet energy, and is typically less than 1.2 
and approximately uniform in rj. 

Events passing single-jet HLT requirements are used in this analysis. These triggers require jets 
reconstructed from calorimetric information with the anti -kj clustering algorithm and with en¬ 
ergy corrections applied. Jets are ordered in decreasing jet pj, and the leading jet pj is required 
to be above a certain threshold. As offline jets are reconstructed with the PF algorithm, this may 
result in a trigger not being fully efficient near the threshold. Trigger efficiencies are studied as 
a function of the leading jet pj for all trigger thresholds. Values of the leading jet pj, where the 
trigger efficiency is determined to be larger than 99%, are listed in Table [T] It also summarizes 
the prescale factors and the effective integrated luminosities collected using the different HLT 
thresholds. 


Table 1: Prescales, integrated luminosity and offline pj threshold of the leading jet for different 
trigger paths. The terminology for Level 1 (LI) triggers as well as HLT includes the jet pj 
threshold (in GeV) applicable to the trigger. 


Period 

HLT 

LI 

HLT60 

Singlejet36 

HLT110 

Singlejet68 

HLT190 

Singlejet92 

HLT240 

Singlejet92 

HLT370 

Singlejet92/ 

Singlejetl28 

2011A 

LI prescale 

1-300 

1-10 

1 

1 

1 


HLT prescale 

15-180 

1-5000 

1-60 

1-24 

1 


/Cfpb- 1 ) 

0.29 

6.16 

114.7 

392.2 

2328 

2011B 

LI prescale 

50-400 

1-20 

1-10 

1 

1 


HLT prescale 

80-84 

80-1000 

10-100 

4-30 

1 


/£ (pt- 1 ) 

0.12 

1.12 

40.2 

136.0 

2767 

Overall 

TV pb- 1 ) 

0.41 

7.29 

154.8 

528.2 

5096 


pj threshold 

110 GeV 

190 GeV 

300 GeV 

360 GeV 

500 GeV 


Jets are selected with restrictive criteria on the neutral energy fractions (both electromagnetic 
and hadronic components), and all the jets are required to have pj > 50 GeV and absolute 
rapidity, (y = (1/2) ln[(E + p z )/(E — p z )]), \y\ < 2.5. The jet with the highest pj is required 
to be above a threshold as given by the requirement from the trigger turn-on curve. To avoid 
overlap of events from two different HLT paths, the pj of the leading jet is also required to be 
less than an upper value. The overall criteria are summarized in Table [2] Though data from all 
the five trigger paths are studied, figures from two representative trigger paths (the highest pj 
threshold and a lower one with good statistical accuracy) are presented in this paper. 
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7 Corrections for detector effects 


Table 2: Threshold of the leading jet pj for different HLT paths. This paper shows results from 
two representative trigger paths HLT110 and HLT370. 

HLT HLT60 HLT110 HLT190 HLT240 HLT370 

Teading jet p T (GeV) 110-190 190-300 300-360 360-500 >500 


Events are selected with at least three jets passing the selection criteria as stated above. Addi¬ 
tional selection requirements are also applied to reduce backgrounds due to beam halo, cosmic 
rays and detector noise. The event must have at least one good reconstructed vertex |3H . Miss¬ 
ing transverse energy, 1SS , is required to be less than 0.3 Ej, where the summation is over 
all PF jets. The quantities E™ 1SS and £ Ej are obtained from negative vector sum and scalar sum 
of the transverse momenta of the jets, respectively. A number of event filters l32j| accept only 
those events that have negligible noise in the detector. The jets are ordered in decreasing pj, 
and an event with at least three (four) jets satisfying the jet selection criteria is classified as a 
three-jet (four-jet) event. 


6 Measurements 

The 4-momenta of all the jets in the three- or four-jet event category are transformed into the 
CM frame of the three- or four-jet system. The jets are then ordered in decreasing energy. The 
three- and four-jet variables as described in Section[4]are then calculated from the kinematic and 
angular information of the jets. Since detector resolution varies over the potential kinematic 
ranges, variable bin widths are adopted for the jet masses and the scaled jet energies, while for 
angular variables constant bin widths are used. 

6.1 Detector-level distributions 

The measured distributions of the three- and four-jet variables are compared with predictions 
from two MC generators (PYTHIA6 and MADGRAPH+PYTHIA6), simulated using the identical 
detector condition as that in the data. The identical pileup condition is obtained by reweighting 
the MC simulation to match the spectrum of pileup interactions observed in the data. The size 
of the reweighting correction is typically less than 1%. The agreement between the data and 
the MC predictions is reasonable, so these MC generators are used to correct the measured 
distributions. 

Figures [3] and [4] show the normalized three- and four-jet mass distributions. The data are com¬ 
pared with two different MC programs: PYTHIA6 and MADGRAPH+PYTHIA6, each with two 
different HLTs with pj thresholds above 110 and 370 GeV. As can be seen from the figures, 
there is agreement within a few percent between the data and the predictions of these two sim¬ 
ulations. The difference between the predictions and the data varies typically from 4% to 10%. 
However, there is a systematic deviation observed at high masses where the simulations are 
higher than the data. 


7 Corrections for detector effects 

Multijet variables obtained from MC samples may differ from data because of the detector res¬ 
olution and acceptance. Before comparisons with other experiments or theoretical predictions 
can be made, detector effects are unfolded into distributions at the final-state particle level. The 
basic component of the unfolding is the response function, where experimental observables 
are expressed as a function of theoretical observables. For simplicity, observables are taken in 
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Figure 3: The upper panels display the normalized distributions of the reconstructed three-jet 
mass for events where the most forward jet has \y\ < 2.5. Figures differ by pj ranges of the 
leading jet: 190-300 GeV (a), and above 500 GeV (b) for data (before correction due to detector 
effects) and predictions from MC generators. The bottom panel of each plot shows the ratio of 
MC predictions to the data. The data are shown with only statistical uncertainty. 


190 GeV < Leading jet p T < 300 GeV 5.1 fb _1 (7 TeV) 



Leading jet p T > 500 GeV 5.1 fb' 1 (7 TeV) 



Figure 4: The upper panels display the normalized distributions of the reconstructed four-jet 
mass for events where the most forward jet has \y\ < 2.5. The other explanations are the same 


as Fig. [3] 
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8 Systematic uncertainties 


discrete sets, and the response function is replaced by a response matrix. The observed distri¬ 
bution is then unfolded with the inverse of response matrix to obtain a distribution corrected 
for detector effects. Matrix inversion has potential complications, because it cannot handle 
large statistical fluctuations and the matrix itself could be singular. Instead, we use the RooUn- 
fold package ||33] with the D'Agostini iterative method l34l as the default algorithm and the 
singular value decomposition method 1551 for cross-checks. 

The default response matrix is obtained using the PYTHIA6 event generator. Statistical uncer¬ 
tainties are estimated from the square root of the covariance matrix obtained from a variation 
of the results generated by simulated experiments. 

The corrections for detector resolution and acceptance change the shape of the three-jet mass 
distributions by approximately 10%, less than 5% for the scaled energy of nonleading jets, and 
up to 20% for the scaled energy of the leading jet. For four-jet variables, corrections applied are 
of the order of 20% for the four-jet mass, 10% for Xbz, and less than 5% for cos Fnr- 


8 Systematic uncertainties 

The leading sources of systematic uncertainty are due to the JES, the JER, and the model depen¬ 
dence of corrections to the data. The distributions are presented in this analysis as normalized 
distributions, thus the absolute scale uncertainty of energy measurement does not play a sig¬ 
nificant role. There are insignificant contributions due to resolution of y. The main contribution 
of JES or JER to the uncertainty in the measurements is due to the migration of events from one 
category of jet multiplicity to the other. 

The effect of pileup in the measured distributions has been studied as a function of the number 
of reconstructed vertices in the event. None of the variables show any significant dependence 
on the pileup condition, so systematic uncertainty due to pileup can be neglected. 

8.1 Jet energy scale 

One of the dominant sources of systematic uncertainty is due to the jet energy scale corrections. 
The JES uncertainty has been estimated to be 2-2.5% for PF jets |[3C)i depending on the jet pj 
and rj. In order to map this uncertainty to the multijet variables, all jets in the selected events are 
systematically shifted by the respective uncertainties, and a new set of values for the multijet 
variables is calculated. This causes a migration of events from an event category of a given 
jet multiplicity to a different jet multiplicity. The migration could be as high as 20% for some 
of the event categories. The corresponding distributions are then unfolded using the standard 
procedure as described in Section [7] The difference of these values from the central unfolded 
results is a measure of the uncertainty owing to the JES. 

Uncertainties owing to the JES are found to be between 0.2-5.5% in the three-jet mass, and 0.3- 
10% in the four-jet mass. The systematic uncertainties are the largest at both ends of the mass 
spectra. The systematic uncertainties in scaled energy are between 0.1% and 2.0%, and those 
in angular variables are in the range 0.1-3.0%. There is a small increase in the uncertainty for 
distributions where there is at least one jet in the endcap region of the detector. 

8.2 Jet energy resolution 

The JER is measured in data using the pj balance in dijet events Il36ll . Based on these measure¬ 
ments, the resolution effects are corrected using simulated events. To study the effect of the 
difference between the simulated and the measured resolution, several sets of unfolded distri- 



8.3 Model dependence in unfolding 


9 


butions are obtained using response matrices from the default resolution matrix and changing 
the jet resolution within its estimated uncertainty. Alternatively the response matrix is con¬ 
structed by convolving the generator level distribution with the measured resolution. The mea¬ 
sured distribution is unfolded by this response matrix vis-a-vis the response matrix determined 
using fully simulated sample of PYTHIA6 events. These two estimates provide independent de¬ 
scriptions of the detector modeling and the difference is used as a measure of the systematic 
uncertainty due to detector performance. Position resolution affects the measurement of the jet 
direction, and it is estimated using simulated multijet events and validated with data. 

Uncertainties owing to the JER are found to be between 0.1-10% in the three-jet mass, 0.3-15% 
in the four-jet mass, 0.1-10% in the scaled jet energies and 0.2-8.2% in the angular variables. 

8.3 Model dependence in unfolding 

Unfolded distributions are obtained using two different response matrices derived from PYTHIA6 
and from MADGRAPH+PYTHIA6 simulations. The difference in the unfolded values, due to the 
choice of response functions, gives a measure of the systematic uncertainty. The uncertainties 
are at the level of 0.1-6.0% in the three-jet mass, 0.1-3.0% in the scaled energy distributions of 
the three-jet variables, 0.1-8.0% in the four-jet mass and 0.1-6.2% in the angular variables in the 
four-jet samples. The uncertainties in the scaled jet energy increases by a few percent for the 
samples with lower values of leading jet pj. 

Unfolding has been carried out using PYTHIA6 and MADGRAPH+PYTHIA6 samples, which has 
the same hadronization model. To test the effects of different hadronization models, MC sam¬ 
ples from HERWIG++, which provides a different PS and hadronization approach, are used. 
However, the simulated event sample generated using HERWIG++ is statistically inadequate 
to be used in a complete unfolding procedure. The difference between bin-by-bin correction 
factors obtained with PYTHIA6 and HERWIG++ is found to be somewhat larger than the un¬ 
certainty due to the difference in the unfolding matrices: 0.1-12% in the scaled energy dis¬ 
tributions of three-jet variables, 0.1-7.7% in the angular variables in the four-jet samples and 
0.1-11.6% in the jet masses. The larger values from the two estimates are chosen as the system¬ 
atic uncertainties due to unfolding. 

8.4 Event selection 

Jet candidates are required to pass certain criteria [j37i| designed to reduce unwanted detector 
effects. This analysis uses jets identified with very restrictive criteria on the ratio of the energy 
carried by neutral to that carried by charged particles. The effect of using these criteria is 
tested by reevaluating the same distributions with jets selected after relaxing the selection on 
the fractions of the energy carried by the neutral and the charged particles. Also, the selection 
on Ej' ss is changed, and the effect of this is estimated from the difference in the observed 
distributions. The uncertainty due to the event selection is found to be below 0.2%. 

8.5 Overall uncertainty 

The first three sources mentioned above are the dominant sources of systematic uncertainty 
The contributions to the uncertainty from the selection requirements and pileup effects are 
found to be negligible. The uncertainties are calculated for each bin of the measured distribu¬ 
tions and are added in quadrature. The overall systematic uncertainty is found to be smaller 
than the statistical uncertainty for most of the bins. Typical uncertainties for the six variables 
studied in this analysis are summarized in Table [3] 
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8 Systematic uncertainties 


Table 3: Uncertainty ranges among the different bins in the topological distributions of the 
three- and four-jet variables. 


Uncertainty source 

Uncertainty (%) for leading jet pj 


190-300GeV 

>500 GeV 

Three-jet mass 

Jet and event selection 

0.1 

0.1 

Jet energy scale 

0.3-5.0 

0.2-5.5 

Jet resolution 

0 .1-10.0 

0 .2-6.0 

Model dependence in unfolding 

0 .2-11.0 

0.2-5.0 

Total systematic uncertainty 

0.3-12.7 

0.2-7.9 

Statistical uncertainty 

1.4-14.5 

0.7-10.2 

Scaled energy of the leading jet 

Jet and event selection 

0.1 

0.1 

Jet energy scale 

0.1-1.9 

0.1-1.4 

Jet resolution 

0 .2-6.2 

0.1-5.4 

Model dependence in unfolding 

0 .1-6.0 

0.5-3.6 

Total systematic uncertainty 

0.8-7.2 

1.1-5.6 

Statistical uncertainty 

1.6-17.2 

0.6-14.2 

Scaled energy of the second-leading jet 

Jet and event selection 

0.1 

0.1 

Jet energy scale 

0 .1-2.0 

0 .1-2.0 

Jet resolution 

0.1-5.0 

0.1-4.2 

Model dependence in unfolding 

0.4-9.0 

0.1-3.5 

Total systematic uncertainty 

1.0-8.3 

0.1-4.6 

Statistical uncertainty 

1.3-16.4 

0.9-8.0 

Four-jet mass 

Jet and event selection 

0.1 

0.1 

Jet energy scale 

0.4-6.9 

0.3-7.0 

Jet resolution 

0.4-11.7 

0.2-4.9 

Model dependence in unfolding 

0.3-7.0 

0.5-8.1 

Total systematic uncertainty 

0.4-13.7 

0.5-11.6 

Statistical uncertainty 

3.1-30.9 

1.4-12.5 

Bengtsson- 

Zerwas angle 


Jet and event selection 

0.1 

0.1 

Jet energy scale 

0.1-3.0 

0.2-2.4 

Jet resolution 

0.4-5.4 

0.2-5.0 

Model dependence in unfolding 
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9 Results 

9.1 Comparison with models 

The normalized differential distributions, corrected for detector effects, are plotted as a function 
of the three- and four-jet inclusive variables and compared with predictions from the four MC 
models: PYTHIA6, PYTHIA8, MADGRAPH+PYTHIA6 and HERWIG++. The variables considered 
for these comparisons are three-jet mass, scaled energies of the leading and next-to-leading jet 
in the three-jet sample in the three-jet CM frame, four-jet mass, and the two angles Xbz and 0 nr. 

For the comparison plots (Figs. [5]-|9j), the upper panel shows the data and the model predictions 
with the corresponding statistical uncertainty. For the data, the shaded area shows the statis¬ 
tical and systematic uncertainties added in quadrature. The lower panels in each plot show 
the ratio of MC prediction to the data for each model. Comparisons are made for two different 
ranges of the leading jet pj: 190 < pj < 300 GeV and pj > 500 GeV. 

190 GeV< Leading jet p T < 300 GeV 5.1 fb’ 1 (7 TeV) Leading jet p T > 500 GeV 5.1 fb'’(7 TeV) 




Figure 5: Distribution of the three-jet mass superposed with predictions from four MC models: 
PYTHIA6, PYTHIA8, MADGRAPH+PYTHIA6, HERWIG++. The distributions are obtained from 
inclusive three-jet sample with the jets restricted in the \y\ region 0.0 < \y\ < 2.5, and with 
leading-jet pj between 190 and 300 GeV (a) or above 500 GeV (b). The data points are shown 
with statistical uncertainty only and the bands indicate the statistical and systematic uncertain¬ 
ties combined in quadrature. The lower panels of each plot show the ratios of MC predictions 
to the data. The ratios are shown with statistical uncertainty in the data as well as in the MC, 
while the band shows combined statistical and systematic uncertainties. 

Figure [5] shows the normalized corrected differential distribution as a function of the three-jet 
mass for two ranges of the leading-jet pj. The three-jet mass distribution broadens for larger 
pj thresholds. The models show varying degrees of success for the different ranges of leading- 
jet pj. Most models differ from the data in the low-mass spectrum. The PYTHIA6 simulation 
provides a good description of the data in the lower pj bin, while it has a larger deviation 
in the higher pj bin. The mean difference is at the level of 1.8-4.0 To. Predictions from Mad- 
GRAPH+PYTHIA6 and PYTHIA8 agree with the data to within 4.5%. HERWIG++ provides the 
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9 Results 


worst agreement among the four models - the mean difference is at the level of 4.0-15%. 


190 GeV < Leading jet p T < 300 GeV 5.1 fb 1 (7 TeV) 



3 



0.7 0.8 0.9 1 


X 

3 


Figure 6: Corrected normalized distribution of scaled energy of the leading-jet in the inclusive 
three-jet sample. The other explanations are the same as Fig. [5j 

Figure [6] shows the corrected normalized differential distribution as a function of the scaled 
leading-jet energy in the inclusive three-jet sample. The distributions peak close to 1 and the 
peaks get sharper for higher leading-jet pj range. The scaled leading-jet energy X 3 is expected 
to follow a linear rise from | to 1 for a phase space model, which has only energy-momentum 
conservation, while QCD predicts a deviation from linearity at higher values of x$. This feature 
is observed in the data, particularly for higher pj bins. Only MADGRAPH+PYTHIA6 provides 
a consistent description of the data. The agreement improves for the sample with leading-jet 
pj above 500 GeV. The difference between the predictions from MadGraph+pythia6 and the 
data are at the level of 3.5-6.1%. 


Figure [7] shows the corrected normalized differential distribution as a function of the scaled 
energy of the second-leading jet, X 4 , in the inclusive three-jet sample. For kinematic reasons, 
X 4 is expected to lie between 1/2 and 1. The distribution peaks around 0.65 for the low pj 
threshold sample. The peak shifts to higher values of X 4 and the distribution becomes broader 
for the larger pj threshold sample. Predictions from MADGRAPH+PYTHIA6 agree with data to 
within 3.1%. Predictions from PYTHIA6 as well as PYTHIA8 deviate by as much as 10% or more 
from the data. Predictions from HERWIG++ also shows a large deviation at higher pj bins. 

Figure [8] shows comparisons of the corrected normalized differential distribution as a function 
of the four-jet mass for the four MC models. The distribution broadens at higher minimum pj 
value. As can be seen from the figure, HERWIG++ provides the worst comparison. The average 
deviations are at the level of 15% for many of the distributions, particularly for the sample with 
leading-jet pj between 190 and 300 GeV. The level of agreement for the other three MC models 
is better than 10% over the entire pj region. 

The sub-leading jets in the four-jet event category are predominantly due to the secondary 
splitting of partons. In case of gluon splitting, they can be due to a qq pair or gluons. Both the 
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Figure 7: Corrected normalized distribution of scaled energy of the second-leading jet in the 
inclusive three-jet sample. The other explanations are the same as Fig. [5] 


190 GeV < Leading jet p T < 300 GeV 5.1 fb'(7TeV) 



Leading jet p T > 500 GeV 5.1 fb'(7 TeV) 



Figure 8: Corrected normalized distribution of four-jet mass. The other explanations are the 
same as Fig. [5] 
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9 Results 


angular distributions, @nr and ^BZ/ are different for these two scenarios and are representative 
of the colour factors for these couplings. 


190 GeV< Leading jet p T < 300 GeV 5.1 fb (7TeV) 
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Figure 9: Corrected normalized distribution of the Bengtsson-Zerwas angle. The other expla¬ 
nations are the same as Fig. [5] 

Figure [9] shows similar comparisons for the Bengtsson-Zerwas angle. Because the azimuthal 
angle is not defined for the back-to-back jets, the opening angle between the two leading and 
two nonleading jets is required to be less than 160°. As can be seen from the average deviation 
of the ratios from unity, predictions from MADGRAPH+PYTHIA6 and HERWIG++ represent the 
data well, while those from PYTHIA6 do poorly. 

Figure [lO] shows the corrected normalized differential distribution as a function of the cosine 
of the Nachtmann-Reiter angle in the inclusive four-jet sample. Most of the models follow the 
broad features of the data. However, the degree of agreement with data is different among 
models. MADGRAPH+PYTHIA6 provides the best description of the data; HERWIG++ with an¬ 
gular ordering in the parton shower is close to the data (the agreement is better than 5%), while 
PYTHIA6 has the largest deviation (the agreement is typically between 10-12%). 


9.2 Effect of hadronization, underlying event, and PDFs 

The disagreement between data and the MC models may arise from the implementation of 
nonperturbative components in the simulation due to the fragmentation model or the choice 
of PDF set. These effects have been investigated by studying the uncertainties due to hadroni¬ 
zation model and PDF parametrization. 

The MC models have different ways of modeling the underlying events and hadronization of 
the partons into hadrons. This may result in different predictions of the distributions of multijet 
variables depending on whether they are computed at the hadron or at the parton level. This 
effect has been investigated by studying two different MC models: PYTHIA6 and HERWIG++. 
This is done by evaluating the distributions at the parton and hadron level. PYTHIA6 uses 
the Lund string model, while HERWIG++ uses the cluster model. Also, colour reconnections 
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Figure 10: Corrected normalized distribution of the cosine of the Nachtmann-Reiter angle. The 
other explanations are the same as Fig. [5j 


are done differently in the two models. A generator-level study is carried out for both these 
models, where the effect of hadronization is studied using distributions from jets at parton- 
and hadron-level. The ratio of the parton- to the hadron-level distribution is then compared. 
The mean difference between the two hadronization models is typically less than 5%. 

Comparisons are also made to different tunes of the underlying event models within PYTHIA6. 
The tunes (D6T, DW, P0, Zl, Z2, Z2*) |10 11. 380401 differ in the cutoff used to regularize the 
1 / pj divergence for final-state partons, the ordering of the showers (virtuality ordering vs. pj 
ordering), multiparton interaction model, PDFs, and data sets used in the tune. The resulting 
distributions agree typically within 5%, so the disagreements with the data cannot be fully 
explained by this effect. 

The MC models use CTEQ6 as the default PDF parametrization. There are many different PDF 
sets, which are based on different input data, assumptions, and parametrizations. Thus any 
calculation of a cross section or distributions in the simulation depends on the choice of PDF 
set. Also, each PDF set has its own errors from its parametric assumptions and data input to 
fitting. The effect of the PDF set choice on the multijet variables is calculated according to the 
recommendation of PDF4LHC group |j4Tll42ll . Since comparisons are made only with leading 
order Monte Carlo models in this paper, only two leading order PDF sets are used in this 
comparison: CTEQ61 and MSTW20081o68cl l43l . The uncertainties are found to be typically at 
the level of 1.0-2.0% depending on the variable type and pj range considered. 


10 Summary 

Distributions of topological variables for inclusive three- and four-jet events in pp collisions 
measured with the CMS detector at a centre-of-mass energy of 7TeV were presented using a 
data sample corresponding to an integrated luminosity of 5.1 fb -1 . The distributions were cor- 
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10 Summary 


rected for detector effects, and systematic uncertainties were estimated. These corrected dis¬ 
tributions were compared with the predictions from four LO MC models: PYTHIA6, PYTHIA8, 
HERWIG++, and MadGraph+pythia6. 

Distributions of three- and four-jet invariant mass from all models show significant deviation 
from the data at high mass. The fact that all models have a common PDF suggests that the PDF 
errors at high mass are underestimated. The PDFs at high invariant mass have recently been 
constrained by CMS using dijet pj distributions @4). 

The MadGraph simulations are based on tree-level calculations for two-, three-, and four- 
parton final states, while PYTHIA and HERWIG++ can have only two partons in the final state 
before showering. Not surprisingly, the three-jet predictions of MADGRAPH+PYTHIA6 give a 
more consistent description of the distributions studied in this analysis. The notable exception 
is at high X 4 (the next-to-leading jet), where two jets carry most of the CM energy. The difference 
is probably due to a double counting of three-parton with two-parton (with a parton from 
showering) final states. 

The PYTHIA and HERWIG++ models give poor descriptions of the energy fractions in the three- 
jet final state. In particular, the distributions of X 3 (the leading jet) show large shape differences 
between data and theory that are inconsistent with PDFs or hadronization model uncertain¬ 
ties. Since the distributions from MADGRAPH+PYTHIA6 agree with those from the data, the 
discrepancies with PYTHIA and HERWIG++ are likely due to missing higher multiplicity ME, 
which are present in MadGraph. 

All the models compared in this study do remarkably well describing the four-jet Bengtsson- 
Zerwas angle. The PYTHIA models have some systematic deviation from the data in describing 
the Nachtmann-Reiter angle. Parton showers with angular ordering, as implemented in HER- 
WIG++, yield a better agreement with the measured data for these angular variables. 
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